Overview

Objective

The objective of this package and research is to assess the relationships between habitat quality and biodiversity in a highly fragmented and human-dominated forested landscape in Massachusetts, through a geospatial analysis lens, integrating spatial analysis, remote sensing, and soundscape ecology

Our study is part of a larger project conducted by Florencia Sangermano to advance the conceptual understanding of how landscape pattern and landscape pattern changes relate to biodiversity within a highly anthropogenically modified ecosystem, informing on the relationships and spatial metrics more useful to monitor biodiversity changes. By integrating landscape ecology with soundscape ecology, the project will contribute to the applications of remote sensing technologies in conservation planning and monitoring of habitat quality.

In order to do this, we created Naturewave. Naturewave is a package created to assess the correlation between anthropogenic variables and acoustic indices at multiple buffers sizes (500, 1000, 1500, 200, 2500, and 3000) for 11 sampling locations across Massachusetts.

Within the package we identify the scale of best influence (the buffer size where acoustic indices are peak) across each sampling location by examining the R2 values of each of the variables at different buffer sizes.

Anthropogenic Variables of Interest

  • Imperviousness

    raw: http://jamba.provost.ads.umass.edu/web/CAPS2011/tiffzips/metricsraw/imperv.zip
  • Connectedness

    raw:http://jamba.provost.ads.umass.edu/web/CAPS2011/tiffzips/metricsraw/connect.zip
  • Vegetative structure

      http://jamba.provost.ads.umass.edu/web/caps2011/tiffzips/settings/structure.zip
  • Road traffic

    raw:http://jamba.provost.ads.umass.edu/web/CAPS2011/tiffzips/metricsraw/traffic.zip

Acoustic Indices:

  • NDSI (Normalized Difference Soundscape Index (NDSI)
  • (Biophony - Anthrophony)/(Biophony + Anthrophony)

Once the correlation is established, we use random forest regression to examine the importance of the variables of interest and predict NDSI values across the study area.

Methodology

Preliminary Work

Approach used to formulate sound indices and habitat quality measures:

  • Wildlife Acoustic SM4 sound recorders were installed during June and July of 2019 and programmed to extract audio samples from 4 AM-6 AM for five days. A total of eleven sites were sampled, including 10 samples in Mass Audubon sites throughout central Massacchusetts (MA), and one sample in Leadmine Mountain forest in Sturbridge (MA).
  • From these soundscapes, the Normalized Difference Soundscape Index, (NDSI), Biophony (sounds from biodiversity), Anthrophony (sounds from humans), Bioacoustics Index (BI), Acoustic Complexity Index (ACI), Acoustic Diversity Index (ADI), and Acoustic Evenness Index (AEI) were calculated. The relationship between these indices and habitat quality measures were extracted.
  • Habitat quality components were identified as the vegetation productivity, measured as the normalized difference vegetation index (NDVI), built-up magnitude, measured as the normalized difference-built index NDBI, and traffic influence, measured as distance from primary, secondary, and tertiary roads.
  • NDVI and NDBI were calculated using Sentinel-2 imagery from Google Earth Engine. Relationships of NDBI and NDBI were extracted at multiple distance buffers from recoding sites to evaluate the scale at which these habitat quality measures had the highest influence on biodiversity.
  • After entering in the values into linear models, the r^2 values were derived and then plotted out as seen in Figure 1 below.

Code Chunk 1: Pearson correlation coefficient for each measured index at multiple scales.

Prelim Results

  • Based on the results of the above examination, it was determined that NDSI had the strongest relationship with NDVI and NDBI. Given the result we relied on NDSI as our proxy to use in examining the relationship between landscape/ecological variables of interest and acoustic indices.

New Linear Models

  • We used a combination of linear regression methods to assess the correlation of variables and sound indices, and identified the buffer size or scale of best influence for NDSI values across the study area using the Pearson’s Correlation Coefficient for each measured variable at multiple scales/buffers.

Code Chunk 2: Code used to create linear models between NDSI and the variables of interest

  • Once we identified the buffer size representing the highest scale of influence (in our case given 2500m) we used a focalWeight circle to replicate the 2500m buffer and passed it over the entirety of the images for imperv, connect, and structure to obtain a weighted matrix for each. We then used the focal function to pass over each image pixel, and multiplied those weights by each pixel value in the neighborhood, and then summed those to get the mean value raster image for each variable across the entire study area.

Code Chunk 3: Focal Mean using a circular focalweight at a buffer of 2500 m

  • Random forest regression was used to evaluate the variables and find out their order of importance at predicting NDSI values. Finally, we created a model that takes NDSI values at 2500 meters buffer from the sampling location in combination with the mean value raster images in order to estimate NDSI values for the entire study area.

Results

Results from linear correlation and R squares.

By examining the R square values from the linear correlation using Pearson’s Correlation Coefficient between the variables of interest and NDSI, we were able to identify the scale at which NDSI values were most significant. At 2500 meters from the sampling location, NDSI values can be considered as representative for the landscape across the study area.

Code Chunk 5: Code to run R^2 Plot of Variables vs NDSI Sound Index

Random Forest Regression Results

We selected four independent variables (Imperviousness, Connectedness, Vegetative Structure, and Traffic) to test their level of importance in predicting NDSI values. The results of the random forest regression at different buffer sizes or scale of best influence revealed that Traffic and Imperviousness are the variables with a higher explanatory power at predicting NDSI values across the study area. We determined this by looking at the Variable Importance Plot which list Traffic and Imperviousness at the top of the variables list. By looking at the %IncMSE at the 2500 scale of best influence, Mean square-error (%IncMSE) and node purity (incNodePurity) are the measurements returned after running the RandomForest algorithm. %IncMSE provides the prediction ability of mean square error with randomly permuted variables; whereas, IncNodePurity calculates the loss when best splits are determined by a given variable. Traffic has a %IncMSE of 9.61 and Imperviousness has a %IncMSE value of 6.69. These two variables are the most important in the regression. They are the ones used to better split the nodes of the regression and achieve higher increases in node purities.

Code Chunk 6: Variable Importance Results of RF Regression

##               %IncMSE IncNodePurity
## Imperv           6.69          0.10
## Connectedness    5.75          0.13
## Structure        5.54          0.11
## Traffic          9.61          0.11

Predictive Modeling Results:

The result from modeling is a continuous surface predicting NDSI values across the landscape based on the values observed at the 2500 meters buffer, which was identified as the scale of best influence. The predictive NDSI surface is biophyscally reasonable because it indicates high concentrations of NDSI values away from areas that are highly developed and have high levels of traffic. Vegetation structure and connectedness are also contributors to NDSI values but to a less extent compared to the imperviousness and traffic. Vegetation structure and connectedness provide good indicators for habitat assessment and connectivity.

Code Chunk 7: Code to Plot out the Predicted NDSI Values using Leaflet

## class      : RasterLayer 
## dimensions : 2513, 2124, 5337612  (nrow, ncol, ncell)
## resolution : 30, 30  (x, y)
## extent     : 133470, 197190, 863430, 938820  (xmin, xmax, ymin, ymax)
## crs        : +proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
## source     : memory
## names      : layer 
## values     : 0.2626267, 0.7772814  (min, max)
## Reading layer `sample_sites_projected' from data source `C:\Users\MVanDenburg\Documents\Spatial_Analysis_R\naturewave\inst\extdata\sample_sites_projected.shp' using driver `ESRI Shapefile'
## Simple feature collection with 11 features and 5 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 146419.9 ymin: 869709.6 xmax: 178745.5 ymax: 928660.8
## epsg (SRID):    NA
## proj4string:    +proj=lcc +lat_1=41.71666666666667 +lat_2=42.68333333333333 +lat_0=41 +lon_0=-71.5 +x_0=200000 +y_0=750000 +datum=NAD83 +units=m +no_defs

Conclusion

Discussion

As mentioned in the objectives, this study is part of a larger project with the larger goal of creating a characterization of the type of landscape structural changes happening in the Worcester plateau and the rate of changes. The next stage of the process is to test out further variables of interest within our model and refine it to create a better predictive model. The predictive model will be validated based on an independent acoustic sample collected during May 11 through July this summer (or next given the closure of parks and forests) following the same collection standards specified for the data collected for this study.

Identifying how habitat structure relates to acoustic diversity, as well as assessing the capacity of models to extrapolate these relationships, will provide a direct link between spatial changes in landscape structure and biodiversity. This information is essential to evaluate how urbanization is shaping remaining forest habitat in Massachusetts, as well as to identify areas of more rapid development and areas of shifting development trajectories.

R Packages Utilized

The soundecology, tuneR, and seewave packages were utilized in order to read in the sound files and analyze the sound frequencies by Biophony and Anthrophony. Sound indices were calculated from these frequencies using fuctions within the package to formulate the NDSI, BI, ACI, ADI, and AEI indices for the +1200 records for the 11 sites. The tidyverse library was loaded in order for us to clean and “tidy” the data into clean csvs. The csvs were then exported and then re-read back in and converted into a singular dataframe organized by site. Linear models were then created to examine each sound index –in addition to anthrophony & biophony– vs the NDVI and NDBI results by buffer 500-3000 meters. The linear r squared values were extracted from each of the linear models and compiled into a dataframe organized by buffer size. The r^2 values were then plotted out as “Sound Indices r^2 values vs NDVI by Buffer” and “Sound Indices r^2 values vs NDBI by Buffer”.

A list of used packages in this stage of the analysis are listed in the description folder.

  • library(tidyverse)
  • library(tidyselect)
  • library(ggplot2)
  • library(soundecology)
  • library(tuneR)
  • library(seewave)
  • library(nlme)
  • library(landscapetools)
  • library(rasterVis)
  • libary(rgdal)
  • library(rgeos)
  • library(geospaar)
  • library(plotly)
  • library(sf)
  • library(lwgeom)
  • library(ggplot2)
  • library(raster)
  • library(tidyverse)
  • library(dplyr)
  • library(caTools)
  • library(randomForest)
  • library(leaflet)
  • library(nlme)

Timelines

Provide a timeline for when each portion of the analysis will be completed. These timelines should be constructed relative to the time period of presentations (during the last two weeks of class) and final project submission (during exam week). For teams, names should be associated with each step on the timeline.

  • 4/13 – Both–Draft of Vignette submitted to Github, send screenshots of code to Estes
  • 4/15 – Gabo– Downloads Repo and is able to make commits remotely
  • 4/17 – Miles– Complete linear correlation between variables and Acoustic indices at multiple buffers – to identify the scale of influence
    • More specifically:
      • Complete Obtaining values for variables of interest at buffer sizes 500-3000
      • Imperviousness
      • connectedness
      • vegetative structure
      • Run Linear correlations and R^2 analyses between variables and acoustic indices
      • Identify peaks

Refine code and results

  • 4/22 – Gabo– Prepare to run Random Forest Regression
  • 4/27 – BOTH– Interpret results and clean up code

Anticipated outcomes

  • Random forest regression between metrics (independent) at the scale of best influence (average filter with kernel size equal to the best buffer size) and indices (dependent).
    — Variables oncluded in the random forest model: — IUCN richness — NDSI – Biophony, Anthrophony (Already Calculated) — NDVI (Already calculated) — NDBI (Already calculated)

  • NDVI an NDBI Maps of Sites in Massachusets using R package reticulate to translate the work done in GEE to one environment– R –3000 buffers and one map each of one site at various buffer levels

– Plots of examined variables of interest: imperviousness, connectedness, a vegetative structure using r^2 and their correlations against the sound indices (NDSI, Biophony, Anthrophony)


Back to top